We want to repeat the analysis performed in the StatusReport.html file at any given date. To do this we need to solve two problems:
Find an easy and robust way to download the DTCC data and, potentially, adding new sources of information.
Find a source of information for interest rate curve data that addresses the issue of intraday movements that we analysed in the previous report
We use the dataonderivatives package to source automatically the information from the various sources. Please note that we can’t use the original package available on CRAN. I hence forked the repository on Github and modified it to account for the new link where DTCC reports can be retrieved from. This is why there is the command remotes::install_github(“DavideMagno/dataonderivatives”) in the next chunck of code.
The data that is downloaded with this package needs to be formatted to be used. That’s why I created two functions in the DataIngestion.R file:
DownloadFromDTCC: download the DTCC report at given dates (even more than one at the same time) and fix dates and notional formatting
DownloadFromCME: download the DTCC report at given dates (even more than one at the same time) and fix dates, time to maturity formatting and saves the floating rate type
Let’s see how to use these functions, with a date different from the 18th of June.
# remotes::install_github("DavideMagno/dataonderivatives")
source(here::here("R/DataIngestion.R"))
dates <- as.Date("2021-06-16")
data <- DownloadFromDTCC(dates)
data
The same can be done for the CME source and this time we download multiple days of data.
dates <- as.Date("2021-06-07") + lubridate::days(0:4)
data.cme <- DownloadFromCME(dates)
data.cme
We can notice that the CME data has much less information than the DTCC.
This data needs to be filtered for USD spot starting fixed-floating interest rate swaps with a maturity above 1 year, which is the core of our analysis. It also needs to be formatted in a way that is readable for the SwapPricer package. This is performed by the following two functions:
SwapsFromDTCC: it takes a vector of dates in input and it downloads, wrangles, filter and put the DTCC data in the right format for the SwapPricer
SwapsFromCME: it takes a vector of dates in input and it downloads, wrangles, filter and put the CME data in the right format for the SwapPricer
Let’s see how to use them:
swaps.dtcc <- SwapsFromDTCC(dates)
swaps.dtcc
We do the same with the CME data:
swaps.cme <- SwapsFromCME(dates)
swaps.cme
The output is empty because there are no swaps with the specific characteristics we have chosen.
We have seen in the StatusReport.html file that taking into account realtime data is extremely important for pricing: the deviation of Market Values from 0 is in fact in the spae of 3/4 PV01s even for spot starting swaps.
We have therefore followed two different routes:
Scraping Realtime Data from the SEB website
Using the very same downloaded data from DTCC or CME as source of par rates
We hence downloaded the intraday USD swap rate from Bloomberg for the maturities of 10, 20 and 30 years during the week between the 5th and the 9th of July. These are saved in an excel file in the Intraday Pricing folder.
intraday.rates <- readxl::read_excel(here::here("Data/Intraday Pricing/IntradaySwaps.xlsx"), skip = 3)
TableIntradayData <- function(columns, maturity, data) {
data |>
dplyr::select(columns) |>
na.omit() |>
dplyr::rename_all(~c("Date", "Value")) |>
dplyr::mutate(Maturity = maturity,
Source = "Bloomberg") |>
dplyr::select(Maturity, Value, Date, Source)
}
intraday.bbg <- purrr::map2_dfr(list(1:2, 4:5, 7:8), c(10, 20, 30), TableIntradayData, data = intraday.rates)
intraday.bbg
We scrape the SEB rates on minute by minute basis and save them on a database.
ConnectToDB <- function(){
db_user <- 'Rstudio'
db_password <- 'Karelias123$'
db_name <- 'swap_rates'
db_host <- '167.71.3.141'
db_port <- 3306
mydb <- RMySQL::dbConnect(RMySQL::MySQL(), user = db_user,
password = db_password, dbname = db_name,
host = db_host, port = db_port)
}
con <- ConnectToDB()
intraday.seb <- con |>
DBI::dbReadTable("usd_swap_rates") |>
dplyr::rename(Value = Price) |>
dplyr::filter(Date >= as.Date("2021-07-05"),
Date <= as.Date("2021-07-09"),
Maturity %in% c(10, 20, 30)) |>
dplyr::mutate(Date = as.POSIXct(paste(lubridate::ymd(Date), Time)),
Source = "SEB",
Date = Date + lubridate::hours(4)) |> # UTC is 4 hours ahead of EST in summer
dplyr::select(-Time)
DBI::dbDisconnect(con)
## [1] TRUE
intraday.seb
We collate the data into one dataframe.
intraday <- intraday.seb |>
dplyr::bind_rows(intraday.bbg)
We plot the data by maturity and distinguishing between SEB and Bloomberg
library(ggplot2)
intraday |>
ggplot(aes(x = Date, y = Value, colour = Source)) +
geom_line() +
facet_grid(rows = vars(Maturity), scales = "free_y")
We can notice that the intraday SEB data fits pretty well the Bloomberg one but the SEB website publishes data on European trading time. This means that it misses the market movements after ~5pm UTC.
We reuse the information downloaded from DTCC using the SwapsFromDTCC function to extract a par swap curve that can be used for pricing. The idea behind its construction is that since these are from actual quotes, we should have some swaps with valuation below 0, others above 0 but the whole set of swaps in general will be close to par. We will use two methodologies to summarise the information by bucket: the mean of the strikes and the median of the strikes.
swaps.dtcc <- swaps.dtcc |>
dplyr::mutate(start.date = as.Date(start.date, format = "%d/%m/%Y"),
maturity.date = as.Date(maturity.date, format = "%d/%m/%Y"),
time.to.mat = round((maturity.date - start.date)/365,0) |>
as.numeric(),
Bucket = cut(
as.numeric(time.to.mat),
breaks = c(seq(from = 0.5,to = 12.5, by = 1),
seq(from = 17.5,to = 52.5, by = 5)),
labels = c(1:12, seq(from = 15, to = 50, by = 5)),
right = FALSE),
Bucket = as.character(Bucket) |> as.numeric())
swap.curve <- swaps.dtcc |>
dplyr::group_by(Bucket) |>
dplyr::summarise(Strike.median = median(strike),
Strike.mean = mean(strike))
swap.curve
We now plot the curve derived over the information from DTCC:
swap.curve <- swap.curve |>
tidyr::pivot_longer(-Bucket, names_to = "type", values_to = "Strike")
swaps.dtcc |>
ggplot(aes(x = time.to.mat, y = strike)) +
geom_point(alpha = 0.2) +
geom_point(data = swap.curve, aes(x = Bucket, y = Strike, colour = type),
size = 2) +
theme_bw() +
labs(x = "Time to maturity", y = "Rate") +
scale_y_continuous(labels = scales::percent)
We notice an outlier in the DTCC data: a swap with a strike of 20%! The use of the median to derive the swap curve facilitates to manage issues like this, but to have a clearer picture we can try to isolate and remove this data. (We need to further analyse this to see if there are additional criteria we need to filter out for clean at par fixed-to-float swaps)
id.outlier <- swaps.dtcc |>
dplyr::filter(strike > 0.05) |>
dplyr::pull(ID)
`%notin%` <- Negate(`%in%`)
swaps.dtcc <- swaps.dtcc |>
dplyr::filter(ID %notin% id.outlier)
swaps.dtcc |>
ggplot(aes(x = time.to.mat, y = strike)) +
geom_point(alpha = 0.2) +
geom_point(data = swap.curve, aes(x = Bucket, y = Strike, colour = type),
size = 2) +
theme_bw() +
labs(x = "Time to maturity", y = "Rate") +
scale_y_continuous(labels = scales::percent)
Removing the outlier it is quite clear how much the “mean” curve was impacted by it. Using the median allows us to avoid the wrangling of these extreme cases before we start the pricing routine and have a robust pricing curve.
We now have a swap par curve that we can use for pricing.
In order to test this, we need to bootstrap the curve we have just generated. TBC